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SUMMARY 


The  purpose  of  this  paper  is  to  discuss  the  asymptotic 
behavior  of  the  sequence  (i) generated  by  the  nonlinear 
recurrence  relation 


) 


Max  I  b^(q)  +  2  a.i  (q/f^^)  L 
q  4  JNl  0  -> 


This  problem  arises  in  connection  with  an  equipment  replacement 
problem,  cf.  S..  Dreyfus,  A  Note  on  an  Industrial  Replacement 
Process,  RAND,  P-1045,  1957. 
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A  MARKOVIAN  DECISION  PROCESS 
By 

Richard  Bellman 


§1-  Introduction. 

The  purpose  of  this  paper  is  to  discuss  the  asymptotic 
behavior  of  the  sequence  |f^(i)J,  1  »  1,2,...,M,  N  =  1,2,..., 

generated  by  the  non-linear  recurrence  relations 


(i) 


fN(i)  -  Max  jb,(q)  +  2  ajr  ,  <q)rN_i  ( J )  | .  N  =  1,2,..., 
q  i-  J*1  J  v 

f 0 ( i )  m  i  *  1,2, 


Although  these  equations  are  nonlinear,  they  possess  certain 
quasi— linear  properties  which  permit  a  more  thorough  discussion 
than  might  be  imagined  upon  first  glance. 

As  we  shall  discuss  below,*  this  question  arises  from  the 
consideration  of  a  dynamic  programming  process.  A  related 
process  gave  rise  to  an  equation  of  the  above  form  which  was 
discussed  in  [lj ,  under  a  particular  set  of  assumptions 
concerning  the  functions  j"b^(q)j  and  the  matrices  A(q)  *  (a^jCq)). 
Here  we  shall  impose  restrictions  of  a  quite  different  type. 

Any  complete  discussion  of  relations  of  the  foregoing  type  is 
at  least  as  detailed  as  a  corresponding  discussion  of  the  linear 
case,  and  as  in  the  linear  case,  the  assumptions  made  to  a 
considerable  extent  determine  the  techniques  employed. 

We  shall  discuss  elsewhere  some  interesting  quadratlcally 
nonlinear  recurrence  relations  which  arise  from  specializing  the 
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form  of  bi(q)  and  a^q).  These  are  related  to  the  types  of 
differential  equations  discussed  in  (5],  being  essentially  a 
particular  type  of  discrete  version. 

It  will  be  clear  from  what  follows  that  similar  techniques 
can  be  utilized  to  treat  the  determination  of  the  asymptotic 
behavior  of  the  sequences  defined  by 

(2)  fN(x)  =  Max  £  b(x,q)  +  Kfx^qJf^Cy )dyj,  N»l,2,..., 

fQ(x)  -  c(x),  0  <  x  £  1, 

and  by  the  equation 

(3)  uN  *  Max  b(q)  +  at(q)  u^  j,  N«k+1,..., 

u^  *  c ,  1  =*0 ,1, . » .  ,k, 

under  corresponding  assumptions.  - 


§2.  Statement  of  Results . 

We  shall  suppose  that  the  functions  b1(q)  and  a^Cq)  satisfy 
cither  of  the  following  sets  of  conditions: 


The  functions  b^(q)  and  j  (q )  ar«  functions  of  finite 

dimensional  vectors  q  whose  components  assume  only  a 
finite  set  of  values,  which  in  general  depend  upon  i 
and  j . 

The  functions  b^(q)  and  ajj(q)  are  continuous  functions 
of  finite  dimensional  vectors  whose  components  assume 
values  in  certain  closed,  bounded  regions  in  q— space, 
which.  In  general,  depend  upon  i  and  j. 
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Either  of  these  sets  of  conditions  ensures  that  the  maximum 
is  assumed  in  the  recurrence  relations  of  (1.1). 

Our  principal  result  is 


Theorem . 

Let 

us 

assume 

that 

either  (1A)  or 

(IB;  is  satisfied 

and  that 

(2) 

a. 

bl 

(q)  >  o, 

and 

b. (q)  >  0  for  . 

some  1  for  all  q, 

b. 

1 

ai 

j(q)  >  d 

o 

A 

x  - 

i,J  =»  1,2,... 

,M,  for  all  q. 

M 

c. 

£ 

j“l 

aij(q) 

-  1, 

i  »  1,2, ...,M 

• 

In  other  words,  A(q)  ljs  for  each  q  the  transpose  of  a  positive 
Markoff  matrix. 

Under  these  conditions,  we  have  the  asymptotic  result 


(5)  rN(i)  ~  Nr,  N  <P  ,  I  *  1,2, .. .  .M, 


q  appearing  in  all  the  relations  of  (1.1) 
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§3*  Preliminaries  on  Markoff  Matrices. 

Before  proceeding  to  the  proof  of  this  result,  let  us 
note  for  future  reference  some  known  results  concerning  the 
asymptotic  behavior  of  the  Iterates  of  the  transpose  of  a 
positive  Markoff  matrix. 

If  y  is  a  non-negative  non-trlvlal  vector,  we  have 

(1)  Any  ^  nr4, 

where  r  is  a  scalar  quantity  dependent  upon  y,  and  4-,  as  above, 
denotes  the  vector  defined  in  (2.5K  Furthermore,  the  limit 

(2)  lim  (Any  —  nr4)  *  x  _ 

n— >  od 

exists,  and  yields  a  vector  x  which  satisfies  the  system  of  linear 
equations 

(3)  r  4-  +  x  *  y  +  Ax. 

These  results  are  well-known  from  the  theory  of  Markoff  chains, 
or  else  may  be  viewed  as  simple  consequences  of  Perron's  theorem 
asserting  the  existence  of  a  positive  characteristic  root  of 
largest  absolute  value  of  a  positive  matrix.  The  associated 
characteristic  vector  may  be  taken  to  be  4. 

§4.  Proof  of  Theorem  I. 

We  begin  the  proof  of  the  theorem  with  a  discussion  of  the 
linear  system  of  (3.4).  Let  q  denote  a  value  of  q  which  maximizes 
in  (2.4).  It  is  easy  to  show  that  the  assumptions  we  have  made 
concerning  the  range  of  values  of  q  ensure  the  existence  of  a 
maximizing  q.  Let  r  *  r(q)  denote  the  maximum  value  of  r  determined 
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by  q,  and  let  x  *  x(q)  denote  the  vector  determined  by  (3.2). 
Then  we  have  the  system  of  equations 


(1) 


r  + 


M 

b1  (q)  +  2  a»-j(q)x*>  i— 1,2,...,M* 
1  j-l  J 


Actually,  each  q  above  should  be  q^,  but  we  feel  no  confusion 
will  result  If  we  omit  this  subscript  and  use  a  generic  q. 

Our  first  task  Is  to  show  that  this  linear  system  Is 
equivalent  to  the  non-linear  system 

r  M 

(2)  r  +  x.  *  Max  b.  (q)  +  2  a 

1  q  L  1  J-l 

In  the  sense  that  the  set  of  x .  satisfying  (1)  also  satisfies 

(2). 

It  is  clear  to  begin  with  that 

M  -j 

(3)  r  +  <  Max  ^  (q)  +Z^  j  (q)x^  J  >  1*1,2, ...  ,M. 

If  the  x^  do  not  satisfy  (2),  there  will  be  strict  inequality 
in  at  least  one  of  the  relations  in  (3).  Without  loss  of 
generality,  let  the  Etrict  inequality  occur  in  the  first  relation. 
Finally,  let  q’  be  a  value  of  q  yielding  the  maximum  on  the  right 
side  of  (2).  Again  we  drop  the  subscripts  in  q*  and  use  a 
generic  symbol,  to  simplify  the  notation. 

We  then  have  the  inequalities 


u(<j)xj 


1*1,2, . . . ,M, 


M 

r  +  x,  <  b,(q' )  +  2 
1  J-l 

M 

r  f  x,  <  b.  (q« )  +  2 

1  1  J-l 


alj(q,)xj, 

ajj  (q1 )x  j ,  1—2 , . . . ,M » 


The  first  inequality  can  be  strengthened  to  read 


'  r  i~-*!-‘~*»*»\  -«,- * 


r-  --^^-v-^ms- 
/V~  * 


P— 

4-18-’ 


(5) 


M 

( 1+a ) r  +  xx  <;  b1(q‘  )  +  2  al j j ' 

J*^ 


where  a  Is  a  positive  quantity. 

Let  us  now  iterate  these  inequalities.  We  obtain 


(6) 


N  N 

r  +  x.  <  b.(q«)  +  2  a. , (q* )b  .  (q* )  +  2  a 

1-1  J-3  J  j-1 


ij 


^(q*  )x 


J  • 


(2) 


-r-ar  a11(q'),  i«l,2,...,M, 


where  (s-j.j (q' ) )  *  A(q’)2- 

Since,  by  hypothesis,  a11(q')  >  a  >  0,  we  obtain  upon 
reverting  to  matrix  notation 

(7)  x  <  b(q')  +  Afa1  )t>(q" )‘  +  A(q*)2x  -  r(2+adji. 

Let  us  now  iterate  this  inequality  N  times .  The  result  is 

2N— 1  2N 

(8)  x  <  b(q')  +  A(q')b(q')  +  ...  +  A(q')  .  b(q’ )  +  A(q' )  x 

—  N(2+ad)i, 

2 

^upon  recalling  that  A(q' )  4  »  4r. 

m  The  maximal  property  of  r  *  r(q)  asserts  that  the  vector 


ff 


(9) 


2N— 1 


b(q')  +  A(q’)b(q’)  +  ...  +  A(q»)  b{q»)  -  N(2+ad)± 


[becomes  an  arbitrarily  large  negative  vector  as  N  increases 
This  contradicts  (8)  for  sufficiently  large  it* 

Hence  (2)  is  equivalent  to  (1). 

§5*  Proof  of  Theorem  II. 


It  is  now  easy  to  complete  the  proof  of  the  theorem.  Let 
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r  and  x^  be  the  quantities  defined  above.  We  wish  to  show  that 
fN(i)  satisfies  the  inequality 


(1)  Nr  +  xi  -  k  <  fN(i)  <  Nr  +  +  k, 


for  1=1, 2,.,., M  and  N=l,2,...,  with  a  suitable  choice  of  k. 

The  proof  is  inductive.  Choose  k  so  that  the  inequalities 
hold  for  N«0.  Suppose  that  they  are  valid  for  n=0,l,...,N. 

Then  (1.1)  yields 


(2) 


fN+l^^  ^ 


Max  I  bl(q)  4- 

Nr  -f  k  +  Max  j 

q  L 


n 

£ 

J-l 


bt(q)  4- 


(q)  [Nr+xj+k]  j 
M 

2  a.  .  (q)x  , 

>1  1J  J  J 


<  Nr  4  k  +  r  +  *  (N4-l)r  4-  x^  4-  k. 


The  lower  bound  is  established  in  the  same  manner. 

The  inequalities  in  (l)  above  yield  the  desired  asymptotic 
behavior,  and  even  a  more  precise  result. 


§6 .  Discussion . 

Lb  in  the  theory  of  Markoff  processes,  the  condition  ajj(Q)^d>0 
can  be  considerably  relaxed  at  the  expense  of  more  detailed 
discussion.  However,  as  the  study  of  Markoff  processes  sho’. ,  it 
cannot  be  relaxed  to  mere  non— negativity .  The  essential 
restriction  is  that  the  equations  describe  one  interlinked 
system,  rather  than  two  independent  systems  arbitrarily  considered 
as  one  system.  The  condition  a^(q)  >  d  >  0  is  one  way  of 
ensuring  this,  but  clearly  there  are  many  others.  The  simplest, 
perhaps,  are  those  obtained  from  powers  of  the  matrix,  i.e. 
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V*/  i, 

a,  ,(q)  >  d  >  0, where  A(q)  -  (a.  .(q)) 


I  §7 .  A  Dynamic  Programming  Process , 

irT  r  1  “*  * 

Let  us  now  briefly  describe  a  dynamic  programming  process, 
[2J,  which  gives  rise  to  recurrence  relations  of  the  type  con¬ 
sidered  above. 

Consider  a  machine  which  is  used  repeatedly  to  produce  a 
certain  type  of  item.  At  each  stage,  there  is  a  probability 
that  the  machine  produces  a  perfect  item,  a  probability  that  it 

| produces  a  defective  item,  and  a  probability  that  the  machine 

I 

I  breaks  down  and  requires  repair  These  probabilities  depend 
upon  the  age  of  the  machine. 

Examining  the  matter  in  more  detail,  let  us  suppose  that 
there  are  k  different  sources  of  failure  within  the  machine, 
leading  to  either  defective  items  or  breakdown  of  the  machine, 
or  both.  Let  us  define  the  following  probabilities 
(1)  pi(n)  *  probability  that  a  machine  breaks  down  due  to 

failure  at  the  i-th  source  after  it  has 
successfully  produced  n  items, 

«  probability  that  a  defective  item  will  be 
produced  due  to  hidden  failure  at  the  i-th 
source  after  n  items  have  been  successfully 
produced . 

At  any  particular  stage,  we  face  the  problem  of  deciding 
whether  to  examine  the  machine  for  possible  failure  at  one  of 
the  sources  of  trouble,  or  to  wait  until  a  defective  item  Is 
produced..  In  addition,  if  a  defective  item  is  produced,  there 


'  1i  ' -  'Vv'-vf.  ' 


P— 1066 
4-18—57 

_9— 


is  the  question  of  whether  we  should  repair  the  machine  insofar 
as  the  immediate  source  of  failure  is  concerned,  whether  we 
should  in  addition  examine  other  potential  sources  of  failure, 
or  whether-  we  should  automatically  provide  new  parts  at  various 
sources  of  failure,  without  preliminary  inspection. 

The  decisions,  of  course,  will  be  dependent  upon  the  costs 
Incurred  in  carrying  out  these  operations,  the  costs  due  to 
defective  parts,  and  the  costs  due  to  breakdown  of  the  machine. 

The  state  of  the  system  at  any  time  can  be  characterized 
by  the  set  of  numbers  (n^  n2,  ...,  nk)  specifying  the  number  of 
items  n^  produced  since  the  ith  source  of  trouble  was  examined. 

The  problem  is  then  to  determine  the  inspection  and  replace¬ 
ment  policy  which  minimizes  the  expected  unit  cost  of  production. 

To  treat  this  problem,  we  begin  with  the  problem  of  determining 
the  policy  which  minimizes  the  expected  cost  of  producing  N  items. 
Define  the  sequence  of  functions 

fN(nl,n2, . . . ,nk)  *  expected  cost  of  producing  N  items  using 

an  optimal  inspection  and  replacement 
policy  starting  in  state  (n1 ,n2, . . . ,nk) . 

In  view  of  the  above  discussion,  it  follows  that  the  effect 
of  any  decision,  to  produce  without  Inspection  or  repair,  to 
inspect  with  possible  replacement,  or  to  replace,  la  to  transform 
the  system  from  its  present  state  into  anotner  state.  Assume  that 
only  a  finite  number  of  states  are  permissible,  and  enumerate 
them  in  some  order,  i=l,2,...,M.  Any  particular  decision,  desig¬ 
nated  by  q,  leads  to  a  recurrence  relation 
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I  I  1(2)  fN(i)  »  b±(q)  +  2  at  (q)f  ^(j). 

j*l  J 

The  principle  ol'  optimality,  cf.  [2],  asserts  that  q  is  chosen 

so  as  to  yield  the  equation 

r  m  * 

(J)  fN(i)  -  Min  b^q)  +  2  a1  (q)  f^U)  . 

q  t  J«1  J  J 

The  theorem  proved  above  in  §2  states  that  there  is  a  steady- 
state  optimal  policy  to  which  we  converge  as  N  — $►  od  ,  provided 
that  the  ^^(q)  satisfy  .certain  restrictions. 

As  we  have  discussed  above,  the  natural  condition  is  the 
j|f  system  be  interlinked,  i.e.  not  separable  into  two  distinct 
|I  systems . 

■t|  Particular  examples  of  processes  of  the  above  general  type 
m are  discussed  in  [4]  and  [5]. 
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